Back - Trigger

Learning Articles - Website Development

Lacrimae rerum. Memento mori. Memento vivere.

Eleventy Customizations

...

Syntactically Awesome Style Sheets

npm install --global sass

...

...Text... File Minification

...

Minify HTML With Transform
const htmlmin = require ("html-minifier");
				module.exports = function (eleventyConfig) {
					eleventyConfig.addTransform ("htmlmin", function (content, outputPath) {
						// Use this.inputPath and this.outputPath for Eleventy 1.0+.
						if (outputPath && outputPath.endsWith(".html")) {
							let minified = htmlmin.minify (content, {
								useShortDoctype: true,
								removeComments: true,
								collapseWhitespace: true
							});
							return minified;
						}
						return content;
					});
			};
Log Warning For Certain Words
module.exports = function (eleventyConfig) {
					eleventyConfig.addLinter ("inclusive-language", function(content, inputPath, outputPath) {
						let words = "simply,obviously,basically,clearly,easy".split (",");
						// Use this.inputPath and this.outputPath for Eleventy 1.0+.
						if ( inputPath.endsWith(".md") ) {
							for ( let word of words) {
								let regexp = new RegExp ("\\b(" + word + ")\\b", "gi");
								if (content.match(regexp)) {
									console.warn (`Inclusive Language Linter (${inputPath}) Found: ${word}`);
								}
							}
						}
					});
				};

Create Robots.TXT Reference

A robots exclusion protocol provides information to instruct web robots how to crawl pages on a website. It has become a standard which web robots are expected to follow with Google proposing an official standard under the Internet Engineering Task Force (IETF). Essentially, the robots exclusion protocol contains instructions for web robots indicating which pages they can and cannot access. It should be noted that these instructions are only requests and access is not actually being denied but rely on compliance from web robots (malicious web robots are unlikely to honour these instructions). The robots exclusion protocol must be included in the root of the website hierarchy at https://site.example.com/robots.txt (if this file does not exist, it is assumed that there are no limitations on crawling the website). The standard has wide adoption from most popular search engines, including support from Google, Microsoft, and Yahoo . Notably, the Wayback Machine does not respect a robots exclusion protocol.

In the robots exclusion protocol, directives are issued for a specific user agent as either allow or disallow. The user agent is the specific web robot for which the directives are issued, where the common web robots are Googlebot, Bingbot, and Slurp. An allow directive will only allow the user agent to have access to the listed directories and files (in other words, the user agent will not have access to any directory or file which is listed). A disallow directive will only disallow the user agent from having access to the listed directories and files (in other words, the user agent will have access to any directory or file which is not listed). It should be emphasized that multiple directives cannot be issued in a single line. There are non-standard extensions, which include a crawl-delay directive (throttle visits to the host), sitemap, and host, but these are not necessarily supported by most web robots.

Allow all web robots to access all pages of a website (disallow no web robots from accessing all pages of a website):
			User-Agent: *
			Allow: /

			User-Agent: *
			Disallow:
Allow no web robots to access all pages of a website (disallow all web robots from accessing all pages of a website):
			User-Agent: *
			Allow:

			User-Agent: *
			Disallow: /
Example of a robots exclusion protocol with various declarations for different common user agents:
			# Comments are prefixed with an octothorpe.
			User-Agent: *
			Disallow: /drafts/
			Disallow: /secret/letters.html
			Disallow: *.gif$
			User-Agent: Foobot
			Disallow: /
			Allow: /hidden/page.html
			User-Agent: Googlebot
			User-Agent: Bingbot
			Disallow: /hidden/page.html

If it is not possible to defined...

https://www.robotstxt.org/faq/noindex.html

Create Sitemap.XML Reference

A sitemap is a file which lists URLs for webpages on a website along with additional metadata about each URL. This can be helpful to allow search engines to index the pages of a website which are available for crawling (complementary to other methods which look at links within the website and from other websites). As of 2022-12, the latest protocol of Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons Licence and has wide adoption from most popular search engines, including support from Google, Microsoft, and Yahoo. Usually, the sitemap is included in the root of the website hierarchy at https://site.example.com/sitemap.xml. (It should be noted that for small websites, typically of less than 500 pages, a sitemap is not required for indexing in most cases).

The sitemap usually uses an XML schema and UTF-8 encoding with the compulsory tags including the set to encapsulate all collections with the current protocol standard, parent collection for each entry, and location of the collection as a child tag (must begin with the protocol and end with a trailing slash). The optional tags are relevant as children to a collection and include the last modified date (specified in W3C Datetime format, as YYYY-MM-DD), frequency at which the entry is likely to change (valid values are always, hourly, daily, weekly, monthly, yearly, and never), and priority of the entry relative to other entries in the sitemap (valid values range from 0.0 to 1.0 with a default value of 0.5). The location of the sitemap is usually specified in the robots.txt reference. If necessary, multiple sitemap files can be grouped with indexing, but any sitemap must be from a single host or domain and can only contain entries relative to its location. It should be noted that it is necessary to use entity escape codes for certain special characters (including & (&amp;), ' (&apos;), " (&quot;), > (&gt;), < (&lt;), and spaces (&nbsp;)) and extended ASCII characters (non-alphanumeric and non-latin).

Generate a sitemap with all pages part of collections (exclude from collections to exclude from the sitemap):
			---
			permalink: /sitemap.xml
			eleventyExcludeFromCollections: true
			---
			<?xml version = "1.0" encoding = "utf-8"?>
			<urlset xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9">
			    {% for page in collections.all %}
			    <url>
			        <loc>{{ site.url }}{{ page.url | url }}</loc>
			        <lastmod>{{ page.date.toISOString() }}</lastmod>
			        <changefreq>{{ page.data.changeFreq if page.data.changeFreq else "yearly" }}</changefreq>
			        <priority>{{ page.data.priority if page.data.priority else 0.8 }}</priority>
			    </url>
			    {% endfor %}
			</urlset>
Example of a simple site map used for a website with location, last modified date, change frequency, and priority:
			<?xml version = "1.0" encoding = "utf-8"?>
			<urlset xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9">
			    <url>
			        <loc>https://site.example.com/</loc>
			        <lastmod>2020-09-26T00:00:00.000Z</lastmod>
			        <changefreq>monthly</changefreq>
			        <priority>0.8</priority>
			    </url>
			    <url>
			        <loc>https://site.example.com/404.html</loc>
			        <lastmod>2020-09-26T00:00:00.000Z</lastmod>
			        <changefreq>yearly</changefreq>
			        <priority>0.2</priority>
			    </url>
			    <url>
			        <loc>https://site.example.com/About/</loc>
			        <lastmod>2021-07-01T00:00:00.000Z</lastmod>
			        <changefreq>yearly</changefreq>
			        <priority>0.6</priority>
			    </url>
			</urlset>
Add a declaration of the sitemap to the robots.txt reference:
				---
				permalink: /robots.txt
				eleventyExcludeFromCollections: true
				---
				User-Agent: *
				Allow: /
				Disallow:
				Sitemap: {{ site.url }}/sitemap.xml

Download Clean Template

A clean template with a personally customized workflow can be downloaded. This website uses a similar template and it is designed to have a specific directory structure to hide the background configuration. The setup requires the official distribution of Dart Sass for Node to compile the SASS to CSS and ...for minification of the HTML and JavaScript... . It should be recognized that this template was based on and tested using a global installation as of Eleventy v0.12.1 and it may have become outdated with later versions of Eleventy.

Figure of the directory structure.